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TITLE 

NOVEL HUMAN al CHAIN COLLAGEN 



BACKGROUND OF THE INVENTION 
5 Field of the Invention 

The present invention relates to a novel human collagen 
protein and a polynucleotide sequence, which encodes the 
novel human collagen protein. More particularly, it relates 
to polynucleotides encoding human al chain collagen and 
10 derivatives thereof. 



0 Description of the Related Art 

21 Collagens are structure proteins that participate in 

V the assembly of various kinds of polymers in the 

45 extracellular matrix. Collagen polypeptides contain one or 

.*, more blocks of (Gly-x-y) repeat, in which x represents any 

,j amino acid residue, and y frequently represents prolyl or 

Sis:. 

5 hydroxyprolyl residues. The presence of such sequence 
,i repeats allows groups of three collagen polypeptides to fold 
2 0 into triple-helical domains, which are rigid and 
inextensible . 

So far, 20 distinct types of homo- and heterotrimeric 
molecules, encoded by more than 3 0 genes, have been 
identified in vertebrates. These proteins exhibit 

25 considerable diversity size, sequences, tissue distribution, 
molecular composition, and each plays a different structure 
role in connective tissue. 

Within the superfamily of collagens, two categories are 
classified. The fibrillar collagens include types I, II, 
III, V, and XI collagen. The triple-helical domains of the 



30 



1 



10 



Client's Ref.: BMEC-89-24/ 10-30-2001 
File: 0648-5921 -US/Final/ Frank/Chiumeow 

proteins polymerize in a staggered fashion to form fibrils. 
Members of other collagens do not by themselves form cross- 
striated fibrils, but may be associated with fibrils (FACIT 
or fibril associated collagens with interrupted triple 
helices), including types IX, XII, XIV, XVI, and XIX 
collagen. The structure of these molecules comprises two or 
more relatively short triple-helical (COL) domains connected 
and flanked by non-triple-helical (NC) sequences. Type IX 
collagen is the best-characterized molecule in the members. 
Studies of transgenic mice with mutations in type IX 
g collagen have been proposed that it acts as molecular 
fj bridges between cartilage collagen fibrils and other matrix 
ji components, perhaps proteoglycans. The COL domains and the 
•\ central NC domains of this molecule interact with type II 
=45 collagen through covalent cross-links to form fibrils. The 
amino -terminal NC domain has a potential of interacting with 
2 other extracellular components. Also, in vitro studies have 
demonstrated that the N-terminal non- triple-helical domains 
of type XII and XIV collagen promote contraction of collagen 
gels. However, the detailed interactions of the bridging 
hypothesis are not clear. 

Collagens are typical mosaic proteins containing a 
number of shuffled domains. These domains have been 
classified by sequence similarity in order to characterize 
25 their structural and functional relationships to other 
proteins. This analysis provides an overview of homologies 
of collagen domains. It also reveals two new relationships: 
(i) a module common to type V, IX, XI, and XII collagens was 
found to be homologous to the heparin binding domain of 
thrombospondin; (ii) the modular architecture of a human 
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type VII collagen fragment was identified. Its N- terminal 
globular domain contains fibronectin type III repeats 
located adjacent to a von Willebrand factor type A module. 
The proposed structural similarities point to analogous 
5 subfunctions of the respective domains in otherwise distinct 
proteins . 

Thrombospondin is one of a class of adhesively 
homotrimeric glycoproteins that mediate cell-to-cell and 
cell-to-matrix interactions. It is expressed in 

10 extracellular matrix, and may have autocrine growth 
5 regulatory properties involved in platelet aggregation, 
*0 embryogenesis , morphogenesis, cell adhesion molecule, major 
g| activator of TGFpi. The von Willebrand factor A (vWF) like 
j!i domain is the prototype for a protein superfamily and it is 
t*L5 found in various proteins including plasma complement 
\?k factors, integrands, collagens, and other extracellular 
r proteins. Proteins that incorporate vWF domains participate 

:;; r in numerous biological events, such as cell adhesion, 
U migration, homing, pattern formation, and signal 

2 0 transduction. 

Collagens are important bio-medical building blocks 
with the functions of tissue growth, anaplasty, dressing for 
burn, and wound healing, etc., and the requirements thereof 
expand largely. Therefore, there is still a need to develop 
25 a novel collagen and the derivatives thereof having more 
therapeutic value and diversity for the various applications. 

The present inventors have successfully cloned a novel 
human al chain collagen gene by way of known human 
expression sequence tag (EST) in combination with bio- 

3 0 informatics and molecular cloning techniques. After 
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comparison of the inventive collagen with the existent 2 0 
collagens, the highest sequence homology is less than 3 0%, 
indicating the collagen of the invention is a novel form. 

Blood vessels are tubes of endothelial cells surrounded 
5 by layers of smooth muscle cells and connective tissue 
proteins. During development this complex structure forms as 
a result of biochemical signals between endothelial cells 
and smooth muscle cells. Sometimes this biochemical 
communication fails and abnormal blood vessels form. By 
10 analyzing gene mutations causing such vascular abnormalities, 
it can be learned about the signals necessary for normal 
blood vessel development. In addition, identification of 

Ms? 

|i genes responsible for inherited vascular malformations 
|V provides a basis for development of rational therapies in 

^■15 the clinical treatment of vascular disorders. 

? 

H SUMMARY OF THE INVENTION 

y It is therefore the primary object of the present 

y, invention to provide an isolated nucleic acid (hCOLAl) and 
2 0 the degenerate sequences thereof, which encodes human al 
chain collagen protein, comprising the nucleotide sequence 
set forth in SEQ ID NO. 5. The present invention also 
provides the expression profile of the isolated collagen 
gene and the exact tissue and cellular localization of this 
25 collagen protein. Moreover, the present invention provides 
nucleotide fragments derived from SEQ ID NO. 5 as a nucleic 
acid probe or primer. 

In one preferred embodiment, the present invention 
provides a novel human al chain collagen protein encoded by 
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the nucleic acid mentioned above, which has the amino acid 
sequence set forth in SEQ ID NO. 1. 

Another aspect of the present invention provides a 
recombinant vector comprising the nucleic acid mentioned 
above and a regulatory sequence. 

Still another aspect of the present invention provides 
a method for producing human al chain collagen protein, 
comprising the steps of: (a) transforming or transfecting a 
host cell with the recombinant vector described above; (b) 
culturing said transformed or transfected cell under the 
conditions sufficient for expression of the human al chain 
collagen protein; and (c) recovering and purifying the human 
al chain collagen protein. 

Yet still another aspect of the present invention 
provides a diagnostic kit for detecting the disease related 
to the mutation of SEQ ID NO. 5 in a mammal or human, 
comprising the nucleic acid probe or primer described above. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be more fully understood and 
further advantages will become apparent when reference is 
made to the following description of the invention and the 
accompanying drawings in which: 

FIG. 1 is a diagram showing the PCR cloning for the 
human al chain collagen cDNA of the invention, wherein lane 
1 is the molecular weight markers; lane 2 is the negative 
control in which the human cDNAs are devoid; and lane 3 is 
the PCR product containing the cDNA coding for the human al 
chain collagen of the invention. 
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FIG. 2 is a diagram showing the construct of the 
recombinant vector Bluescript KS ( + )/£. coli [hCOLAl) of the 
present invention. 

FIG. 3 is a diagram showing the complete nucleotide 
5 sequence (SEQ ID NO. 5) and the corresponding amino acid 
sequence (SEQ ID NO. 1) of the human al chain collagen of 
the invention. 

FIG. 4 is a diagram showing the hydropathy profile of 
the deduced amino acid sequence of SL. Kyte-Doolittle 
10 hydrophobic ity profile of the human al chain collagen 
| plotted with a 11 -residue window. 

| FIG. 5 is a schematic diagram showing domain structure 

| of the human al chain collagen protein of the invention. 
h FIG. 6 (A-C) is a diagram showing the amino acid 

'•15 sequence comparison of the human al chain collagen protein 
h of the invention with type IX (Col9al) and type XIX (Coll9al) 
\ collagens wherein the black box refers to the identical 

1 amino acid residue, and symbol " -" refers to gap insertion 
h for the optimal alignment. The lower panel of FIG. 6 also 

2 0 shows the evolution tree among these collagens. 

FIG. 7(A) is a Northern blot containing 2 jag of 
poly (A) + RNA from indicated tissues hybridized with human al 
chain collagen cDNA-specif ic probe; and human P-actin- 
specific probe as an internal control. FIG. 7(B) is a 

2 5 Northern blot containing 2 jag of poly (A) + RNA from indicated 

cardiovascular tissues hybridized with human al chain 
collagen cDNA-specif ic probe; and human glyceraldehyde 3- 
phosphate dehydrogenase (GAPDH) probe as an internal control. 
FIG. 8 is a quantitative RT-PCR of the expression of 

3 0 human al chain collagen from human fetal and adult tissues. 
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Human glyceraldehyde 3 -phosphate dehydrogenase was used as 
internal control . 

FIG. 9 is an in situ hybridization analyses of 
expression of the human al chain collagen mRNA expression. 
5 Cardiovascular sections and cells were hybridized with 
digoxigenin labeled antisense riboprobes for human al chain 
collagen. (A) Longitudinal section, artery; (B) longitudinal 
section, ventricle; and (C) aortic smooth muscle cells. 
Control hybridizations labeled with sense probes did not 
10 produce signals (data not shown) . Bar, 10 fxm. 

5 FIG. 10 is a diagram showing the expression of the 

Ms? '" "* 

|D human al chain collagen protein in E. coli . FIG. 10(A) 
shows SDS-PAGE analysis, wherein the numbers indicated are 
y_ molecular weight standards; lane 1 is the non-induced cell 
Nl5 lysate; lane 2 is the cell lysate induced by IPTG for 2 
s p i hours; and lane 3 is the cell lysate induced for 3 hours, 
f/s I n FIG. 10(B), lane 1 shows the human al chain collagen 
O protein purified by Ni -column and stained with Coomassie 
|4, brilliant blue; and lanes 2 and 3 are western blot detected 
2 0 by anti-histidine antibody, wherein lane 2 is the non- 
induced cell lysate and lane 3 is the cell lysate induced by 
IPTG for 2 hours without purification. 

FIG. 11 is a diagram showing RT-PCR of the recombinant 
expression of human al chain collagen in COS7 cells, 
25 wherein refers to negative control; "+" is the RT-PCR 

products from transf ormant s containing human al chain 
collagen gene; and numbers indicated are molecular weight 
standards . 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention screened the most conserved 
regions of the known collagen nucleic acid sequences from 
human expressed sequence tag (EST) library. A novel bio- 
5 molecule with the highest homology of the primary amino acid 
sequence was then found by introducing the sequence of that 
region into EST library. A full-length cDNA sequence of the 
novel bio-molecule was then obtained and determined by the 
technologies of bio-informatics and molecular cloning. 
10 At the beginning, a 57-bp fragment of the conserved 

«| region was aligned in the human EST library to obtain a 
0 fragment with about 3 00 bp in length. The fragment was then 
ffii introduced into Genbank Blast for searching human non- 
J7 redundant genes to obtain a fragment with about 14 6 kb in 
H*L5 length containing exons and introns . Possible open reading 
!=:?„ frames were analyzed and the relative oligonucleotide probes 
T~ were thus designed to clone the novel full-length human al 
G chain collagen. The method of cloning will be further 

y, described in the following examples. 

2 0 The nucleic acid sequence of the full-length human al 

chain collagen (hCOLAl) gene and the deduced amino acid 
sequence thereof are shown in FIG. 3 (SEQ ID NO. 5 and SEQ 
ID NO. 1, respectively) . The novel human al chain collagen 
gene comprises 2,865 bp, which encodes 954 amino acids with 
25 about 99,000 Da in molecular weight, and is located at the 
pll.2-12.3 on human chromosome XI. 

The above isolated nucleic acid (hCOLAl gene) comprises 
at least the nucleotide sequence set forth in SEQ ID NO. 5 
(including DNA and RNA sequences) or the complementary 

3 0 sequences thereto, and the genomic DNA sequence. Those 
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skilled in this art will be aware that the nucleotide 
sequences can be modified in accordance with any method 
known in the art, and are also within the scope of the 
invention. For example, degenerate codons can be used to 
5 replace the relative positions but the gene encodes the same 
amino acid sequence. Further, additional codons can be 
inserted into the sequence or added either at the 3'- or 5'- 
end, but the activity of the protein is not affected or 
slightly affected. Accordingly, the complementary sequences 
10 and degenerate sequences of SEQ ID NO. 5, and various 

;~ modified variants are included in the present invention. 

See, for example, Sambrook, et al . , Molecular Cloning, A 

g s | Laboratory Manual, Cold Spring Harbor Laboratory Press, New 

!> York, 1989. 

**15 In accordance with the present invention, the human al 

hk chain collagen encoded by the isolated hCOLAl gene comprises 

Li three domains (as shown in FIG. 5) , including (i) von 
yi Willebrand factor A domain (having amino acid sequence set 
hk forth in SEQ ID NO. 2); (ii) thrombospondin N- terminal -like 

2 0 domain (having amino acid sequence set forth in SEQ ID NO. 
3) ; and (iii) collagenous domain (having amino acid sequence 
set forth in SEQ ID NO. 4) . To analyze the primary sequence 
of the protein and to compare with other known 2 0 collagens, 
the human al chain collagen of the present invention 
2 5 belongs to FACIT family (fibril associated collagens with 
interrupted triple helices) . From the domains described 
above, it is inferred that the physiological functions of 
the human al chain collagen of the invention may be 
involved in platelet aggregation, cell adhesion, and the 
30 activation of transformation growth factor. In addition, 



9 



Client's Ref.: BMEC-89-24/ 10-30-2001 
File: 0648-5921 -US/Final/ Frank/Chiumeow 

the N-terminal of the collagen protein further includes a 
signal peptide with 22 amino acids in length. It is 
inferred that the human ocl chain collagen of the invention 
is located in the extracellular matrix. 

The present invention also provides a recombinant 
vector comprising the nucleic acid cloned and isolated above, 
and optionally a regulatory sequence, such as replication 
region, selection marker (e.g. antibiotic resistance marker), 
and promoter, etc. The promoter used herein can be an 
eukaryotic cell promoter or a prokaryotic cell promoter so 
that the recombinant vector can be expressed in a suitable 
host, for example, eukaryotic cells such as mammalian cells 
or yeast, or prokaryotic cells such as Escherichia coli. 

In one preferred embodiment, the present invention 
provides a recombinant vector in which the hCOLAl gene is 
cloned into Bluescript KS ( + ) vector (Strategene) . The 
recombinant vector is deposited at the Culture Collection 
and Research Center (Hsinchu, Taiwan) on November 14, 2000, 
and assigned accession number CCRC 940331. 

One can produce the novel human ocl chain collagen 
protein which have the amino acid sequence set forth in SEQ 
ID NO. 1 using the isolated nucleic acid described above by 
any suitable method in any suitable expression system known 
in this art. Therefore, the method for producing human al 
chain collagen protein is also within the scope of the 
present invention. 

One preferred expression system for the recombinant 
production of the collagen of the invention is in transgenic 
non-human animals, wherein the desired collagen may be 
recovered from the milk of the transgenic animal . Such a 
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system is constructed by operably linking the DNA sequence 
encoding the collagen of the invention to a promoter and 
other required or optional regulatory sequences capable of 
effecting expression in mammary gland. Likewise, required 
5 or optional post- translat ional enzymes may be produced 
simultaneously in the target cells, employing suitable 
expression system operable in the targeted milk protein 
producing mammary gland cells. 

In one preferred embodiment of the present invention, 
10 the nucleic acid of SEQ ID NO. 5 is subcloned into an 
% expression vector to obtain another recombinant vector. A 
=3 suitable host cell (for example, eukaryotic or prokaryotic 
fff cell) is then transformed or transfected with the 
T~ recombinant vector. The transformed or transfected cells 
HT5 are then cultured under the conditions sufficient for 
M» expression of human ocl chain collagen protein. Finally, 
f's the expressed proteins are recovered and purified. Those 
'~~ skilled in this art will appreciate that the recovering and 

y H purifying method is not limited, for example, by various 
2 0 chromatographies. Preferably, the human eel chain collagen 
is expressed using histidine tag fusion protein technique, 
and the recovering and purifying method is performed by 
affinity column. 

As used herein, the term "transformation" or 
2 5 "transf ection" includes a variety of techniques for 
introducing an exogenous nucleic acid into a cell (for 
example, eukaryotic or prokaryotic) , including calcium 
phosphate or calcium chloride precipitation, microinjection, 
DEAE-dextrin-mediated transf ection, lipofection, or 

30 electroporation . 
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Electroporation is carried out at approximate voltage 
and capacitance (and corresponding time constant) to result 
in enter of the DNA construct (s) into the host cells. 
Electroporation can be carried out over a wide range of 
5 voltages (e.g. 50 to 2,000 volts) and corresponding 
capacitance. Total DNA of approximately 0.1 to 500 ug is 
generally used. 

Methods such as calcium phosphate precipitation and 
colubrine precipitation, liposome fusion and receptor - 
10 mediated gene delivery can also be used to transfect cells. 
/*| The genetic engineering methods mentioned above such as 

tff DNA modification, cloning, construction, and isolation of 
fil the recombinant vector, protein expression, and purification 
y., can be accomplished by those skilled in this art, and which 
^15 can be seen in, for example, Ausubel F. M. , et al . , Current 
|* Protocols in Molecular Biology, New York, 1992; Sambrook, et 
1,1 al . , supra; or Davis, L. G., Methods in Molecular Biology, 
»• Elsevier, Amsterdam, NL, 1986. 

hk In one aspect of the present invention, the isolated 

2 0 nucleic acid further includes the fragments derived from SEQ 

ID NO. 5 or the complementary sequences thereto to be as a 
nucleic acid probe or primer for detection. Those skilled 
in the art will be aware that the length of the nucleic acid 
fragment is not limited. For example, as a nucleic acid 
25 probe, the fragment preferably comprises at least 500 
contiguous nucleotides in length derived from SEQ ID NO. 5 
or more, while as a nucleic acid primer, the fragment 
preferably comprises at least 20 contiguous nucleotides in 
length derived from SEQ ID NO. 5 or more. The selection of 

3 0 the length of fragment is dependent upon the conditions of 
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detection method as described below. For example, the 
temperature and ionic strength used in hybridization, or the 
temperature used in polymerase chain reaction (PCR) . 
Generally, the length of the nucleic acid probe is in 
proportion to the specificity of the detection result. 
Accordingly, the nucleic acid probe preferably comprises at 
least 500 contiguous nucleotides in length derived from SEQ 
ID NO. 5, and more preferably comprises the full-length 
nucleic acid of SEQ ID NO. 5. In addition, the length of 
the nucleic acid primer is in proportion to the specificity 
of the detection result. Accordingly, the nucleic acid 
primer preferably comprises at least 2 0 contiguous 
nucleotides in length derived from SEQ ID NO. 5, and more 
preferably comprises 20-25 contiguous nucleotides, thereby 
increasing the specificity of the detection result. 

The human al chain collagen polynucleotide of the 
present invention may be used for diagnostic and/ or 
therapeutic purposes. For diagnostic uses, the 

polynucleotide of the invention may be used to detect the 
human al chain collagen gene expression or aberrant al 
chain collagen gene expression in disease states, e.g., 
rheumatoid arthritis, osteoarthritis, reactive arthritis, 
autoimmune bearing disease, cartilage inflammation due to 
bacterial or viral infections (e.g. Lyme's disease), 
parasitic disease, bursitis, corneal diseases, ankylosing 
spondylitis (fusion of the spine) , and cardiovascular 
disease . 

In the present invention the inventor suggested the 
novel collagen is derived from blood vessels, and maybe 
relates to cardiovascular disease. By analyzing gene 
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mutations causing cardiovascular abnormalities, it can 
provide a basis for development of rational therapies in the 
clinical treatment of cardiovascular disorders. 

The kit of the present invention used for detecting 
such diseases comprises a probe or primer described above. 
Methods for detecting the expression of hCOLAl gene by using 
the nucleic acid probe include, but are not limited to, 
Northern analysis, Southern analysis, in situ hybridization, 
and bio-chip/microarray, etc., which are well known in the 
art. Those skilled in the art will appreciate that methods 
using the complementary properties between two nucleic acid 
molecules are within the scope of the present invention. In 
addition, methods for detecting the expression of hCOLAl 
gene by using the nucleic acid primer include, but are not 
limited to, reverse transcriptase polymerase chain reaction 
(RT-PCR) , 5 '-Rapid Amplification of cDNA End (5 '-RACE), and 
3' -RACE, etc. Those skilled in the art will appreciate that 
methods using the at least one primer in combination with 
PCR are also within the scope of the present invention. 

The human ocl chain collagen gene and/or protein of the 
present invention may be useful in the treatment of various 
abnormal conditions. By introducing gene sequences into 
cells, gene therapy can be used to treat conditions in which 
the cells underexpress normal al chain collagen or express 
abnormal /inactive al chain collagen. In some instance, the 
polynucleotide sequence encoding the human al chain 
collagen of the invention is intended to replace or act in 
the place of a functionally deficient endogenous gene. 
Alternatively, abnormal conditions characterized by 
overprolif eration can be treated using the antisense of the 
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human al chain collagen coding sequence of the invention. 
Recombinant gene therapy vectors, such as viral vectors, may 
be engineered to express the human al chain collagen of the 
invention. Thus recombinant gene therapy vectors may be 
used therapeutically for treatment of diseases resulting 
from aberrant expression or activity of the human al chain 
collagen of the invention. 

Without intending to limit it in any manner, the 
present invention will be further illustrated by the 
following examples. 

EXAMPLE 

Example 1 . cDNA cloning of hCOLAl 

A Clontech SMART RACE cDNA Amplification kit was used 
to clone hCOLAl cDNA. Sequence specific primers used for 
the following RACE reactions were either deduced from the 
previously published partial human genomic clone 682J15 
(Genbank Accession No. AL034452) or the cloned hCOLAl cDNA 
fragment. Initially, first strand cDNA was synthesized from 
1 jag of total RNA pool (Clontech) using Superscript II 
reverse transcriptase with a specific primer 5'- 
GGTTCACCTTTGCTTCCCTTAG-3 ' , deduced from the clone 682 J15. 
The reaction was following to the manufacture's protocol. 
The above reverse transcription reaction mixture was used 
for 5'RACE reaction with a sequence specific primer (5'- 
TTGGCCCATTAATCCTCGGTTTC-3 ' ) , corresponding to nucleotides 
1823-184 5 of the hCOLAl cDNA and the universal primer 
provided by the kit. All assays were performed in a 50 -ul 
reaction volume using the GeneAmp PCR system 9600 (Perkin- 
Elmer Cetus) . 
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To obtain the entire coding region of hCOLAl gene, 
first strand cDNA was synthesized from 1 |ag of total RNA 
pool (Clontech) using Superscript II reverse transcriptase 
with an oligo dT primer. After reverse transcription, 1 \il 
of the reaction mixture was used for PCR amplification with 
a upstream primer (5 ' -ATTCCTGGGCCACCTGGTCCGATA-3 ' ) , 

corresponding to the most 5' candidate initiator methionine 
of the clone 708F5 (Genbank Accession No. AL031782) and a 
downstream primer (5 ' - CTAATAGTTTGGTCCTTTTCT— 3 ' ) , 

corresponding to the 3' stop codon of the clone 682 J15. A 
single band with a molecular size of 2.9 kilo bases was 
obtained (Figure 1) . The band was excised from gel and 
cloned into the BlueScript II KS(+) vector (Strategene) . 
The recombinant vector was deposited at the Culture 
Collection and Research Center (Hsinchu, Taiwan) and 
assigned accession number CCRC 940331. After nucleotide 
sequence analysis, the PCR product was found to contain the 
entire open reading frame of hCOLAl. 

Example 2. Nucleotide sequencing 

Nucleotide sequencing was carried out with the Sanger 
dideoxynucleotide chain termination method (Sambrook, et al . , 
1989) . The sequence samples were prepared using the Ampli 
Taq cycle sequencing kit (Perkin-Elmer , Inc.) following the 
manufacturer's protocol. The samples were applied to a 377 
automatic sequencer (Perkin-Elmer, Inc.). All reported 
sequences were confirmed by sequencing of both sense and 
antisense strands. The full-length nucleotide sequence (SEQ 
ID NO. 5) and the deduced amino acid sequence (SEQ ID NO. 1) 
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of the human ccl chain collagen of the invention is shown in 
FIG. 3. 

Example 3 , Northern blo t analysis 

The human multiple tissue and the cardiovascular 
Northern blots, containing 2 [xg of poly (A) + RNA from 
indicated tissues, were obtained from Clontech (catalog 
number 7780-1 and 7791-1, respectively) . The blot was 
hybridized with a randomly primed 32 P-labeled probe 
corresponding to nucleotides 1236-1863 of the hCOLAl open 
reading frame at 60 °C in ExpressHyb solution for one hour 
and washed with 2x SSC/0.1% SDS two times for 15 min each at 
60 °C. Then the blot was washed with 0.2x SSC/0.1% SDS three 
times for 15 min each at 60 °C. Human (5 actin or GAPDH probe 
was used as a control for the amount of RNA in each lane. 
As shown in Fig. 7A, a transcript of approximately 4.3 kb is 
observed, in agreement with the size of the cloned cDNA. 
The expression of hCOLAl collagen is mostly confined to 
placenta and heart tissues, with lower levels in skeletal 
muscle, small intestine, liver and lung. Another transcript 
of approximately 2.4 kb was detected to be hybridized with 
the probe in heart tissue. It probably is a splicing 
variant of the hCOLAl gene. We further examined the 
expression pattern of hCOLAl in human cardiovascular tissues 
containing fetal heart and adult heart tissues, together 
with the aortic and cardiac tissues by Northern blot 
analysis. Surprisingly, the hCOLAl transcripts were only 
present in fetal heart and aortic tissues (Fig. 7B) . 
Moreover, the 2.4 kb short transcript was only present in 
the fetal heart . Another 7.3 kb band was detected in both 
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tissues. We do not know if this is an additional splicing 
variant of the hCOLAl gene. No hybridization signal was 
detected in adult heart and cardiac tissues. Although the 
result showing the absence of hCOLAl transcript in adult 
heart is inconsistent with the data of Northern blot 
analysis in Fig. 7A, the hCOLAl mRNA level in fetal heart is 
22 -fold in excess of the adult heart based on the 
quantitative RT-PCR results in Fig. 8 (see below) . The 
presence of the hCOLAl transcripts in aorta suggests that 
this novel collagen is derived from blood vessels. 

Example 4. Quantitative RT-PCR 

Five micrograms of total RNAs from a variety of human 
fetal and adult tissues obtained from Clontech (catalog 
number K4005-1) were used for reverse transcription 
reactions with oligo (dT) primers. After reverse 

transcription reactions, the relative quantity of endogenous 
GAPDH mRNA in each tissue sample was determined with CYBR 
Green fluorescence dye (Molecular Probes) using Real-time 
PCR analysis (LightCycler, Roche Molecular Biochemicals) . 
The resulting GAPDH mRNA value in each tissue sample was 
used to normalize the sample for differences in the amount 
of total RNA added to each PCR reaction. Each of the 
normalized tissue samples was then split to perform the 
target ocl(XXI) collagen and control GAPDH amplifications by 
Real-time PCR analysis. The relative quantity of al(XXI) 
collagen cDNA in each reaction was determined in the 
exponential phase to ensure that the amount of product 
amplified reflects the quantity of starting mRNA. Primers 
used for PCR amplifications are as follows: GAPDH (5'- 
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TGAAGGTCGGAGTCAACGGATTTGGT-3 ' and 5'- 

CATGTGGGCCATGAGGTCCACCAC-3 ' ; 983 -bp fragment); hCOLAl 
collagen (5 ' -TTCCTGGAAACCGAGGATTAATG-3 ' and 5'- 

AGTCCACGATCACCCTTGTCAC-3 ' ; 1546-bp fragment). Meanwhile, 
samples at a PCR cycle in the linear range of amplification 
(3 0 cycles for hCOLAl; 2 0 cycles for GAPDH) were 
electrophoresed on 1.5% agarose and stained with ethidium 
bromide for visualization. As shown in Fig. 8, when 
normalized to the GAPDH values, the relative amounts of 
hCOLAl transcripts were 2.7, 22 and 30 times more in fetal 
brain, heart and liver than in the adult counterparts, 
respectively. The results indicate that hCOLAl expression 
is developmentally regulated and suggest a role for al(XXI) 
collagen in developmental processes in multiple tissues. 
Comparison of the hCOLAl expression in different adult 
tissues reveals that high levels of hCOLAl expression were 
detected in trachea, testis, uterus, and placenta, with 
modest levels of expression in brain, lung, colon, prostate, 
spinal cord, and salivary gland. The hCOLAl collagen mRNA 
expression was very low or undetectable in adult heart, 
liver, kidney, bone marrow, spleen, thymus, skeletal muscle, 
and adrenal gland. 

Example 5. In situ hybridization analysis 

In situ hybridization was performed on 5-/xm human 
cardiovascular tissue sections (Novagen, catalog number 
70316-3) . An antisense or sense RNA probe labeled with 
digoxigenin-UTP (DIG-UTP) encompassing the region 
corresponding to nucleotides 123 6-1547 of the hCOLAl open 
reading frame was obtained by in vitro transcription 
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(Boehringer RNA labeling kit) . Sections were dewaxed by 
washing three times for 5 min in xylene. After dewaxing, 
sections were rehydrated to PBS through an ethanol series, 
washed three times in PBS, and then incubated for 15 min in 
5 a proteinase K solution (10 pg/ml in PBS) . Proteinase K 
activity was stopped by washing twice in PBS and sections 
were refixed at RT for 30 min in 4% paraformaldehyde, 0.2% 
glutaraldehyde in PBS. After fixation, sections were washed 
twice in PBS then incubated for 1 h at 50 °C in pre- 
10 hybridization mix (50% foramide, 5X SSC, 50 (xg/ml yeast tRNA, 
o.l% SDS and 50 pg/ml heparin). Hybridization mix 
€• containing probe was replaced and incubated at 50 °C for 
m overnight. After hybridization, sections were washed twice 
f! for 30 min at 50°C in solution I (50% foramide, 5 X SSC, and 
Ml5 0.1%SDS) and twice for 30 min at 50°C in solution II (50% 
l k - foramide, 2 X SSC, and 0.1% SDS). Sections were washed 
f* three times at RT in MAB (100 mM maleic acid, 150 mM NaCl , 
Q pH- 7.5) and then blocked for 2 h at RT with 2% blocking 
§3 reagent (Boehringer) in MAB. Sections were incubated for 2 
20 h at RT with 2% blocking reagent in MAB containing 1: 2000 
dilution of anti-DIG antibody. The sections were washed 4 
times for 15 min at RT in MAB-Tween (0.1% Tween-20) , washed 
three times for 5 min in AP buffer (0.1 M Tris-HCl, pH 9.0, 
50 mM MgCl 2 , 0.1 M NaCl, and 0.1% Tween) . Color was 
25 developed by incubating the sections in NBT/BCIP. After 
developing, sections were washed in PBS and counterstained 
with nuclear fast red and then mounted with Histomount 
Mounting Solution (Zymed) . Cells grown on coverslips for in 
situ hybridization analysis were performed according to the 
30 previously published protocol. 
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Sam ple 6- Expression of hCOT.Al in Escherichia coli and 
purification 

The entire coding region of the hCOLAl cDNA was 
amplified by PCR with primers 5' -ATGGCTCACTATATTACATTTCTC-3 ' , 
corresponding to the 5' cDNA region and 5'- 
TTAGTGATGGTGATGGTGATGCTCATAGTTTGGTCCTTTTCTG-3 ' , 
corresponding to the 3' region including 6 histidine 
residues right before the stop codon. The amplified DNA 
construct was gel purified and sub-cloned into the 
expression vector pET 15b (Novagen) in which the Nco I site 
was digested and blunted with Klenow fragment. The 
recombinant protein was obtained by expressing the 
constructs in E. coli strain BL21 (DE3) . The transformed E. 
coli was cultured in LB medium containing 100 |xg/ml of 
ampicillin at 37 °C to reach an optical density of 0.7 at 
60 0 nm, followed by induction with IPTG at a final 
concentration of 1 mM and kept culturing for an additional 2 
or 3 hours. The cell lysate with total proteins was 
analyzed by SDS-PAGE. The result is shown in FIG. 10(A). 

One liter of the IPTG induced E. coli cells was 
cultured for 2 hours and then centrifuged at 5000 xg for 3 0 
min. The cell pellet was washed with PBS and centrifuged 
again. The cell pellet was then re- suspended in 2 0 ml of 
PBS containing 1 mM of PMSF. The cell suspension was 
subjected to ultrasonicat ion to break the cell walls. The 
cell lysate was then centrifuged at 30,000 xg for 40 min. 
The supernatant was applied to a Ni -agarose column (5 ml in 
bed volume) that has been equilibrated with 50 mM of Tris- 
HC1 buffer, pH 8.0 at a flow rate 0.5 ml/min. The column 
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was washed with the same buffer containing 4 0 mM of 
imidazole. The recombinant hCOLAl was eluted with the same 
buffer containing 0.25 M imidazole. The eluate was 

quantified and analyzed by SDS-PAGE, followed by staining 
with Coomassie brilliant blue. A protein band with 98 kDa 
in molecular weight was observed on the gel (FIG. 10(B), 
lane 1) . In addition, the proteins without purification 
were blotted to a PVDF membrane. An antibody to histidine 
tag (Clontech) was used to detect the recombinant protein. 
The result of Western blot is shown in FIG. 10(B), lanes 2 
and 3, in which the band indicated at 98 kDa corresponds to 
the human <xl chain collagen protein of the invention. 

Example 7. Expression of hCOLAl in eukaryotic cell 

The hCOLAl cDNA containing entire open reading frame 
prepared by Example 4 was gel purified and sub- cloned into 
the expression vector pcDNA 3 . 1 containing CMV promoter 
(Invitrogen) in which the Pme I site was digested and 
blunted with Klenow fragment. The mammalian cells C0S7 were 
transfected with the expression vector via Superfect 
(Qiagen) , and cultured in DMEM supplemented with 10% FBS 
(Life Technologies) for 48 hours. About 10 6 cells were used 
for the extraction of total RNA. The reverse transcription 
was performed with oligo dT primer using 0.2 |ag RNA as 
template. After reaction, PCR was carried out with primers 
T7 and BGHrev on the pcDNA3 . 1 vector using 0.5 |al solution. 
The result is shown in FIG. 11, indicating that the vector 
is expressed in the transfected mammalian cells. 
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Referring to FIG. 4, the first 22 amino acid residues 
indicated by a solid bar encode a putative signal peptide 
characterized by secreted proteins. It is inferred that the 
human al chain collagen protein of the invention is located 

5 in the extracellular matrix. 

The amino acid sequence of the human al chain collagen 
protein of the invention is compared with those of other 2 0 
known collagens, particularly type IX and type XIX, the most 
similar in structures. The amino acid sequence identity of 

0 collagens between type IX and hCOLAl of the invention is 24%, 
while that between type XIX and hCOLAl of the invention is 
2 7%, indicating that hCOLAl of the invention is a novel form 
of collagen (FIG. 6) . 

While the invention has been particularly shown and 

5 described with the reference to the preferred embodiment 
thereof, it will be understood by those skilled in the art 
that various changes in form and details may be made without 
departing from the spirit and scope of the invention. 
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SEQUENCE LISTING 

SEQ ID NO: 1 

( i ) SEQUENCE CHARACTERISTICS : 

5 (A) LENGTH: 954 amino acids 

(B) TYPE : amino acid 

(ii) MOLECULE TYPE: protein 

(iii) FEATURE: 

(A) NAME: Alpha 1 chain collagen 
10 (B) OTHER INFORMATION: /note="Where 

P=P*=Hydroxyproline" 





MAHY I TFLCMVLVLLLQNS VLAEDGEVRS S CRTAPTDLVF I LDGSYSVGP 


50 




ENFEIVKKWLVNITKNFDIGPKFIQVGWQYSDYPVLEIPLGSYDSGEHL 


100 


pi 5 


TAAVESILYLGGNTKTGKAIQFALDYLFAKSSRFLTKIAWLTDGKSQDD 


150 




VKDAAQAARDSKITLFAIGVGSETEDAELRAIANKPSSTYVFYVEDYIAI 


200 




SKI RE VMKQKLCE E S VC PTR I P VAARDERGFD I LLGLD VNKKVKKR I QL S 


250 




PKKIKGYEVTSKVDLSELTSNVFPEGLPPSYVFVSTQRFKVKKIWDLWRI 


300 




LTIDGRPQIAVTLNGVDKILLFTTTSVINGSQWTFANPQVKTLFDEGWH 


350 


•20 


Q I RLLVTEQDVTLYI DDQQ I ENKPLHPVLG I LINGQTQ I GKYSGKEETVQ 


400 




FDVQKLRIYCDPEQNNRETACEIPGFCLNGPSDVGSTPAPCICPPGKPGL 


450 




QGPKGDPGLPGNPGYPGQPGQDGKPGYQGIAGTPGVPGSPGIQGARGLPG 


500 




YKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGDKGSPGFYGKKGAK 


550 




GEKGNAGFPGLPGPAGEPGRHGKDGLMGSPGFKGEAGSPGAPGQDGTRGE 


600 


25 


PGIPGFPGNRGLMGQKGEIGPPGQQGKKGAPGMPGLMGSNGSPGQPGTPG 


650 




SKGSKGEPGIQGMPGASGLKGEPGATGSPGEPGYMGLPGIQGKKGDKGNQ 


700 




GEKGIQGQKGENGRQGIPGQQGIQGHHGAKGERGEKGEPGVRGAIGSKGE 


750 




SGVDGLMGPAGPKGQPGDPGPQGPPGLDGKPGREFSEQFIRQVCTDVIRA 


800 




QLPVLLQSGRIRNCDHCLSQHGSPGIPGPPGPIGPEGPRGLPGLPGRDGV 


850 


30 


PGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPEGPPGI 


900 




SKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARRDPFRK 


950 



GPNY 954 
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SEQ ID NO: 2 

(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 171 amino acids 

(B) TYPE: amino acid 

(ii) MOLECULE TYPE: peptide 

(iii) FEATURE: 

(A) NAME: von Willebrand factor A domain 

0 

DLVFILDGSYSVGPENFEIVKKWLVNITKNFDIGPKFIQVGWQYSDYPV 50 
LEIPLGSYDSGEHLTAAVESILYLGGNTKTGKAIQFALDYLFAKSSRFLT 10 0 

KIAWLTDGKSQDDVKDAAQAARDSKITLFAIGVGSETEDAELRAIANKP 15 0 

S S T YVFYVEDY I AI S KI RE VM 171 



f'7 SEQ ID NO: 3 

hi (i) SEQUENCE CHARACTERISTICS: 

p0 (A) LENGTH: 183 amino acids 

hk (B) TYPE: amino acid 

(ii) MOLECULE TYPE : peptide 

(iii) FEATURE: 

(A) NAME: Thrombospondin N- terminal -like domain 

25 

GFDILLGLDVNKKVKKRIQLSPKKIKGYEVTSKVDLSELTSNVFPEGLPP 50 
SYVFVSTQRFKVKKIWDLWRILTIDGRPQIAVTLNGVDKILLFTTTSVIN 10 0 

GSQWTFANPQVKTLFDEGWHQIRLLVTEQDVTLYIDDQQIENKPLHPVL 15 0 

G I L INGQTQ I GKY S GKE E T VQFDVQKLR I YCD P 183 



SEQ ID NO: 4 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 9 amino acids 

(B) TYPE: amino acid 

(ii) MOLECULE TYPE: peptide 
5 (iii) FEATURE: 

(C) NAME: collagenous domain 

(D) OTHER INFORMATION: /note=" Where 
p = P * =Hydr oxyp r o 1 i ne " 

10 



0 


GKPGLQGPKGDPGLPGNPGYPGQPGQDGKPGYQGIAGTPGVPGSPGIQGA 


50 




RGLPGYKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGDKGSPGFYG 


100 




KKGAKGEKGNAGFPGLPGPAGEPGRHGKDGLMGSPGFKGEAGSPGAPGQD 


150 


U 5 


GTRGE PG I PGFPGNRGLMGQKGE I GP PGQQGKKGAPGMPGLMGSNGS PGQ 


200 




PGTPGSKGSKGEPGIQGMPGASGLKGEPGATGSPGEPGYMGLPGIQGKKG 


250 




DKGNQGEKGI QGQKGENGRQG I PGQQGI QGHHGAKGERGEKGE PGVRGAI 


300 




GSKGESGVDGLMGPAGPKGQPGDPGPQGPPGLDGKPGREFSEQFIRQVCT 


350 




DVIRAQLPVLLQSGRIRNCDHCLSQHGSPGIPGPPGPIGPEGPRGLPGLP 


400 


11° 


GRDGVPGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPE 


450 




GPPGISKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARR 


500 




DPFRKGPNY 509 




25 


SEQ ID NO: 5 






(i) SEQUENCE CHARACTERISTICS: 






(A) LENGTH: 2 865 base pairs 






(B) TYPE: nucleic acid 






(iv) MOLECULE TYPE: cDNA 




30 


(v) FEATURE: Alpha 1 chain collagen 
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atggctcactatattacatttctctgcatggttttggtgctgcttcttcagaattctgtg 60 

ttagctgaagatggggaagtaagatcaagttgtcgtactgctccgacagatttagttttc 12 0 

atcttagatggctcttatagtgttggcccagaaaactttgaaatagtgaaaaagtggctt 180 

gtcaatatcacaaaaaactttgacatagggccgaagtttattcaagttggagtggttcaa 240 

5 tatagtgactaccctgtgctggagattcctctcggaagctatgattcaggagaacatttg 3 00 

acggcagcagtggaatccatactctacttaggaggaaacacaaagacagggaaggccatc 3 60 

cagtttgcgctcgattacctttttgccaagtcctcacgatttctgactaagatagcagtg 420 

gtacttacggatggcaaatcccaagatgacgtcaaggatgcagctcaagcagcaagagat 4 80 

agtaagataacattatttgctattggtgttggttcagaaacagaagatgccgaacttaga 540 

1 0 gctattgccaacaagccttcgtctacttatgtgttttatgtggaagactatattgcaata 600 

tccaaaataagggaagtgatgaagcagaaactttgtgaagaatctgtctgtccaacacga 660 

attccagtggcagctcgtgatgaaaggggatttgatattcttttaggtttagatgtaaat 720 

aaaaaggttaagaaaagaatacagctttcaccaaaaaagataaaaggatatgaagtaaca 7 80 

OJh tcaaaagttgatttatcagaactcacaagcaatgttttcccagaaggtcttcctccatca 84 0 

|| ; 5 tatgtatttgtgtctactcaaagatttaaagtcaagaaaatttgggatttatggagaata 900 

J3 ttaactattgatggaaggccacaaatagcagttaccttaaatggtgtggacaaaatctta 960 

ttatttacaacaaccagcgtaattaatggctcacaagtggttacctttgctaaccctcaa 102 0 

1|! gttaagacgttgtttgatgaaggctggcaccaaattcgtctcttagtaacagaacaagat 108 0 

U gtgactttgtatattgatgaccaacaaattgaaaacaagcccttacatccagttttaggg 114 0 

*k0 atcttgatcaatgggcaaacccaaattggaaaatattctggaaaagaagaaactgttcag 1200 

tttgatgtccaaaagttgcgaatctactgtgacccagaacagaacaaccgggagacagca 1260 

= tgtgagattcctggattttgccttaatggtcccagtgatgtaggttcaactccagctccc 1320 

ii tgtatttgtcctccgggaaaaccaggacttcaaggccccaaaggtgaccctggactgcct 13 8 0 

3 gggaaccctggctaccctggacaacctggtcaagatggtaagcctggatatcagggaatt 144 0 

m5 gcagggacaccaggtgttccaggatctccaggaatacaaggagctcgaggactaccaggt 1500 

tacaaaggagaaccagggcgagatggtgacaagggtgatcgtggacttcctggttttcct 1560 

gggcttcatggcatgccaggatcaaagggtgaaatgggtgccaaaggagacaaaggatca 162 0 

cctggattttatggcaaaaagggtgcaaaaggtgaaaaggggaatgctggcttccctggc 168 0 

ctccctggacctgctggagaaccaggaagacatggaaaggatggattaatgggtagtccc 174 0 

3 0 ggtttcaagggagaagcaggatcccctggtgctccggggcaggatggaacacggggagag 1800 

cctggaatcccaggatttcctggaaaccgaggattaatgggccaaaagggagaaattggg 1860 

cctccaggacagcaaggaaaaaaaggagccccagggatgcctggtttaatgggaagcaat 192 0 

ggctcaccaggccagcctggaacaccgggatctaagggaagcaaaggtgaacctggaatt 198 0 

caagggatgcctggggcttctgggctcaagggagaaccaggagcaacgggttccccagga 2040 

3 5 gaaccaggatacatgggtttacccgggattcaaggaaaaaagggggacaaaggaaatcaa 210 0 

ggtgaaaaaggtattcagggtcaaaagggagaaaatggaagacagggaattccagggcaa 216 0 

cagggaattcaaggccatcatggtgcaaaaggagagagaggtgaaaagggagaacctggt 2 22 0 

gtccgaggtgccattggatcaaaaggagaatctggggtggatggcttgatggggcccgca 22 8 0 

ggtcctaaggggcaacctggggatccaggtcctcagggacccccaggtttggatgggaag 2 34 0 

4 0 cccggaagagagttttcagaacaatttattcgacaagtttgcacagatgtaataagagcc 2 4 0 0 

cagctaccagtcttacttcagagtggaagaattagaaattgtgatcattgcctgtcccaa 24 6 0 
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catggctccccgggtattcctgggccacctggtccgataggcccagagggtcccagagga 2 52 0 

ttacctggtttgccaggaagagatggtgttcctggattagtgggtgtccctggacgtcca 2580 

ggtgtcagaggattaaaaggcctaccaggaagaaatggggaaaaagggagccaagggttt 2 64 0 

gggtatcctggagaacaaggtcctcctggtcccccaggtccagagggccctcctggaata 2 700 

5 agcaaagaaggtcctccaggagacccaggtctccctggcaaagatggagaccatggaaaa 2 76 0 

cctggaatccaagggcaaccaggccccccaggcatctgcgacccatcactatgttttagt 2 82 0 
gtaattgccagaagagatccgttcagaaaaggaccaaactattag 2 8 65 
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